Skip to content

Conversation

@pmachapman
Copy link
Collaborator

@pmachapman pmachapman commented Oct 9, 2025

Fixes #782.

I couldn't find a way to create an effective unit test, except by making the OutboxDeliveryService a partial class, and exposing some of the methods in it for mocking. I did test it locally by stopping the MongoDB container, and the serval-api container did not crash when this occurred, but logged the messages and waited before reconnecting (as Hangfire does).


This change is Reviewable

@pmachapman pmachapman requested review from Enkidu93 and ddaspit October 9, 2025 02:04
@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 64.28571% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.10%. Comparing base (1731e1b) to head (19aa5fe).

Files with missing lines Patch % Lines
...L.ServiceToolkit/Services/OutboxDeliveryService.cs 64.28% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #788   +/-   ##
=======================================
  Coverage   66.09%   66.10%           
=======================================
  Files         370      370           
  Lines       20008    20018   +10     
  Branches     2618     2619    +1     
=======================================
+ Hits        13224    13232    +8     
- Misses       5853     5856    +3     
+ Partials      931      930    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Enkidu93 reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/OutboxDeliveryService.cs line 29 at r1 (raw file):

            try
            {
                await subscription.WaitForChangeAsync(cancellationToken: stoppingToken);

Is there anywhere else this could happen (e.g., waiting for a new build revision)? Is it possible to make this solution more generic?

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit and @pmachapman)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/OutboxDeliveryService.cs line 29 at r1 (raw file):

            try
            {
                await subscription.WaitForChangeAsync(cancellationToken: stoppingToken);

Also, it's unlikely to make a big difference, but I think #440 is perhaps still unaddressed. I wonder if it's worth fixing that problem here if it hasn't been fixed already.

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit and @Enkidu93)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/OutboxDeliveryService.cs line 29 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Also, it's unlikely to make a big difference, but I think #440 is perhaps still unaddressed. I wonder if it's worth fixing that problem here if it hasn't been fixed already.

Unless mistaken, I don't think resuming change streams will help here as MongoSubscription.WaitForChangeAsync() opens a new stream anyway every time it is called. If the change stream fails due to connection timeout or cancellation, a new change stream will be opened when the loop iterates and WaitForChangeAsync is called again.


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/OutboxDeliveryService.cs line 29 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Is there anywhere else this could happen (e.g., waiting for a new build revision)? Is it possible to make this solution more generic?

The key issue is when the error occurs in a background service. From my understanding of Serval,. and experiments locally, I couldn't see any other background services that would fail in this way when Mongo is temporarily down.

If the connection was to timeout in BuildService.GetNewerRevisionAsync(), then the API endpoint which called this method would fail, and the client would call it again. I think that is OK.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/OutboxDeliveryService.cs line 29 at r1 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Unless mistaken, I don't think resuming change streams will help here as MongoSubscription.WaitForChangeAsync() opens a new stream anyway every time it is called. If the change stream fails due to connection timeout or cancellation, a new change stream will be opened when the loop iterates and WaitForChangeAsync is called again.

Yeah, I do think it could be helpful if we used resume tokens, but that's a change that'd be internal to the data access code. We can tackle that separately.


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/OutboxDeliveryService.cs line 29 at r1 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

The key issue is when the error occurs in a background service. From my understanding of Serval,. and experiments locally, I couldn't see any other background services that would fail in this way when Mongo is temporarily down.

If the connection was to timeout in BuildService.GetNewerRevisionAsync(), then the API endpoint which called this method would fail, and the client would call it again. I think that is OK.

OK, I see. Makes sense!

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/OutboxDeliveryService.cs line 30 at r1 (raw file):

            {
                await subscription.WaitForChangeAsync(cancellationToken: stoppingToken);
                if (stoppingToken.IsCancellationRequested)

We should call ThrowIfCancellationRequested() here. This will make the behavior more consistent, i.e. we will log the cancellation in all cases.

@pmachapman pmachapman force-pushed the outbox_service_handle_timeout branch from 9269917 to 19aa5fe Compare October 14, 2025 18:54
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)


src/ServiceToolkit/src/SIL.ServiceToolkit/Services/OutboxDeliveryService.cs line 30 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

We should call ThrowIfCancellationRequested() here. This will make the behavior more consistent, i.e. we will log the cancellation in all cases.

Done. Good idea - thank you!

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ddaspit reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

@pmachapman pmachapman merged commit bba13f7 into main Oct 14, 2025
3 checks passed
@pmachapman pmachapman deleted the outbox_service_handle_timeout branch October 14, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle transient MongoDB errors more gracefully

5 participants